Effects of word and character frequencies on Chinese character writing
نویسندگان
چکیده
منابع مشابه
SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
BACKGROUND Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to. METHODOLOGY Following recent work by New, Brysbaert, and colleagues in English, French and Du...
متن کاملLearning Character Representations for Chinese Word Segmentation
We propose a simple yet effective semi-supervised method for improving Chinese Word Segmentation. Our method is based on learning generalizable vector and cluster representations of variable-length character sequences from large unlabeled data, which is then incorporated into a sequence labeling model with the passive-aggressive algorithm as features. We achieve state-of-the-art results on the ...
متن کاملChinese Word Segmentation as Character Tagging
In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach...
متن کاملOff-line Character Recognition using On-line Character Writing Information
Recognition of variously deformed character patterns is a salient subject for off-line hand-printed character recognition. Sufficient recognition performance for practical use has not been achieved despite reports of many recognition techniques. Our research examines effective recognition techniques for deformed characters, extending conventional recognition techniques using an on-line characte...
متن کاملPragmatic Chinese Lexical Analysis Based on Word-character Hybrid Model
In the field of information and natural language processing, Chinese lexical analysis is important basic step for Chinese, Japanese or other asian language. This paper presents Chinese lexical analysis integrating word-level and character-level information based on hybrid model combining word-based CRF model and latent semi-CRF model. The word-lattice, which represents all candidate outputs, is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Human Neuroscience
سال: 2017
ISSN: 1662-5161
DOI: 10.3389/conf.fnhum.2017.223.00054